Gaussian Process Planning with Lipschitz Continuous Reward Functions: Towards Unifying Bayesian Optimization, Active Learning, and Beyond
نویسندگان
چکیده
This paper presents a novel nonmyopic adaptive Gaussian process planning (GPP) framework endowed with a general class of Lipschitz continuous reward functions that can unify some active learning/sensing and Bayesian optimization criteria and offer practitioners some flexibility to specify their desired choices for defining new tasks/problems. In particular, it utilizes a principled Bayesian sequential decision problem framework for jointly and naturally optimizing the exploration-exploitation trade-off. In general, the resulting induced GPP policy cannot be derived exactly due to an uncountable set of candidate observations. A key contribution of our work here thus lies in exploiting the Lipschitz continuity of the reward functions to solve for a nonmyopic adaptive -optimal GPP ( -GPP) policy. To plan in real time, we further propose an asymptotically optimal, branch-and-bound anytime variant of -GPP with performance guarantee. We empirically demonstrate the effectiveness of our -GPP policy and its anytime variant in Bayesian optimization and an energy harvesting task.
منابع مشابه
Active reward learning with a novel acquisition function
Reward functions are an essential component of many robot learning methods. Defining such functions, however, remains hard in many practical applications. For tasks such as grasping, there are no reliable success measures available. Defining reward functions by hand requires extensive task knowledge and often leads to undesired emergent behavior. We introduce a framework, wherein the robot simu...
متن کاملTowards Practical Theory: Bayesian Optimization and Optimal Exploration
This thesis discusses novel principles to improve the theoretical analyses of a class of methods, aiming to provide theoretically driven yet practically useful methods. The thesis focuses on a class of methods, called bound-based search, which includes several planning algorithms (e.g., the A* algorithm and the UCT algorithm), several optimization methods (e.g., Bayesian optimization and Lipsch...
متن کاملOn-line Active Reward Learning for Policy Optimisation in Spoken Dialogue Systems
The ability to compute an accurate reward function is essential for optimising a dialogue policy via reinforcement learning. In real-world applications, using explicit user feedback as the reward signal is often unreliable and costly to collect. This problem can be mitigated if the user’s intent is known in advance or data is available to pre-train a task success predictor off-line. In practice...
متن کاملVerifying Controllers Against Adversarial Examples with Bayesian Optimization
Recent successes in reinforcement learning havelead to the development of complex controllers for real-world robots. As these robots are deployed in safety-criticalapplications and interact with humans, it becomes critical toensure safety in order to avoid causing harm. A first step inthis direction is to test the controllers in simulation. To beable to do this, we need ...
متن کاملThreshold Learning for Optimal Decision Making
Decision making under uncertainty is commonly modelled as a process of competitive stochastic evidence accumulation to threshold (the drift-diffusion model). However, it is unknown how animals learn these decision thresholds. We examine threshold learning by constructing a reward function that averages over many trials to Wald’s cost function that defines decision optimality. These rewards are ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016